Machine Translation with Minimal Reliance on Parallel Resources by George Tambouratzis Marina Vassiliou & Sokratis Sofianopoulos

Machine Translation with Minimal Reliance on Parallel Resources by George Tambouratzis Marina Vassiliou & Sokratis Sofianopoulos

Author:George Tambouratzis, Marina Vassiliou & Sokratis Sofianopoulos
Language: eng
Format: epub
Publisher: Springer International Publishing, Cham


Candidates

Sequence of tokens (lemmatised)

Originating indexed file

TL corpus frequency

Matching score (%)

1

To a store procedure

PC/procedure_NN

3

92.5

2

To an automate procedure

PC/procedure_NN

2

92.5

3

In an ongoing process

PC/process_NN

12

92.5

As can be seen, the first two entries are retrieved from the file containing PC-type phrases with “procedure” as their head, while the third one from the file containing PCs headed with the word “process”. An exhaustive search of the two indexed files has shown that no exact matches to the input phrase exist. The highest matching score is 92.5%, as for none of the three examined phrases the lemma of the third token is matched. Still, the 92.5% score is sufficiently high to form a sound basis for the translation (on the contrary if it was below a user-defined threshold typically chosen from the range of 75 to 90%, this translation would be rejected and the SL order of tokens in the phrase would be adopted). In addition, the frequencies of candidates 2 and 3 are comparable, differing by less than one order of magnitude. As all retrieved phrases have equal matching scores, the winning phrase is selected to be the one with the highest frequency of occurrence in the TL monolingual corpus. In the current example, based on the contents of the fourth column, the chosen phrase is the third phrase of Table 3.3. This phrase is then used as the basis for translating the respective SL side phrase, by replacing the token “ongoing” (which is not an appropriate translation, based on the bilingual lexicon) with the token “automated” that is suggested by the lexicon. The sequence obtained with this replacement (namely “in an automated process”) represents the translation of this phrase, which forms part of the final sentence translation.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.